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ABSTRACT 

Aims. We study the validity of the approximation of a Gaussian cosmic shear likelihood. We estimate the true likelihood for a fiducial 
cosmological model from a large set of ray-tracing simulations and investigate the impact of non-Gaussianity on cosmological parameter 
estimation. We investigate how odd the recently reported very low value of <r 8 really is as derived from the Chandra Deep Field South (CDFS) 
using cosmic shear by taking the non-Gaussianity of the likelihood into account as well as the possibility of biases coming from the way the 
CDFS was selected. 

Methods. A brute force approach to estimating the likelihood from simulations must fail because of the high dimensionality of the problem. 
We therefore use independent component analysis to transform the cosmic shear correlation functions to a new basis, in which the likelihood 
approximately factorises into a product of one-dimensional distributions. 

Results. We find that the cosmic shear likelihood is significantly non-Gaussian. This leads to both a shift of the maximum of the posterior 
distribution and a significantly smaller credible region compared to the Gaussian case. We re-analyse the CDFS cosmic shear data using the 
non-Gaussian likelihood in combination with conservative galaxy selection criteria that minimise calibration uncertainties. Assuming that the 
CDFS is a random pointing, we find tr 8 = 0.68+jJ"™ f° r fixed Q m = 0.25. In a WMAP5-like cosmology, a value equal to or lower than this 
would be expected in » 5% of the times. Taking biases into account arising from the way the CDFS was selected, which we model as being 
dependent on the number of haloes in the CDFS, we obtain erg = 0.71+jj Combining the CDFS data with the parameter constraints from 
WMAP5 yields n m = 0.26 + _° °l and erg 



0.79!^ for a flat universe. 
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1. Introduction 

5h ' Weak gravitational lensing by the large-scale sttucture in the 
Universe, or cosmic shear, is becoming a more and more im- 
portant tool to constrain cosmological parameters. It is largely 
complementary to other cosmological probes like the cosmic 
microwave background or the clustering of galaxies, and partic- 
ularly sensitive to the matter density Q m and the normalisation 
of the matter power spe ctrum os . Important con straints have al- 
ready been obtained by Benjami n et al. (2007), who compiled 
a set of five weak lensing surveys, and from the CFHT Legacy 
Survey dHoekstra et al J 120061 ISemboloni et"ai1l2006t iFu et al.1 
1 20081) . In subsequent y ears, a new generation of surveys like 
KIDS or Pan-STARRS dKaiser & Pan-STARRS Collaboration! 
20051) will allow cosmic shear to be measured with statistical 
uncertainties that are much smaller than the systematic errors 
both on the observational and the theoretical sides. Strong ef- 
forts are now being made to find sources of systematics in 



the process of shap e measurement and shear estimation (e.g. 
Massev et alj|2007al) . In addition, new methods of shape mea- 
surement are being explored, such as th e shapelet formalism 
(Refregier & Baco nll2003[ lKuiikenll2006|) or the methods pro- 
posed in lBernstein & Jarvisl ((2002) and Mill er et alJ (|2007). 



It is equally important to have accurate theoretical model 
predictions that can be fit to the expected high-quality mea- 
surements. Currently, these models are all based on fitting for- 
mulae for the three-dimensional matter p ower spectrum de- 
rived f rom A^-body simulations as given bylPeacock & Doddsl 
1996) and more recently by Smit h et al.l d2003h . However, 



these are only accurate at best to the percent level on the 
scales relevant to this and similar works when compared to 
ray-tracin g simulations based on state-of-the-art A^-body sim- 
ulations ( Hilbert eUd.1 120091) . such as the Millennium Run 
( Springel etaL 2005b . Therefore, there is a strong need for a 
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large ray-tracing effort to obtain accurate semi-numerical pre- 
dictions for a range of cosmological parameters. 

While a tremendous effort is currently being directed to the 
solution of these problems, the actual process of parameter esti- 
mation has so far received relatively little attention. Obviously, 
the statistical data analysis has to achieve the same accuracy as 
the data acquisition if the aforementioned efforts are not to be 
wasted. 

The standard procedure for converting measurements of 
second-order cosmic shear statistics into constraints on cosmo- 
logical parameters is to write down a likelihood function and 
to determine the location of its maximum for obtaining esti- 
mates of the cosmological parameters of interest. To make this 
feasible, several approximations are commonly made. Despite 
the shear field being non-Gaussian due to nonlinear structure 
growth, lacking an analytical description the likelihood is most 
often approximated by a multivariate Gaussian distribution. 
The covariance matrix for the Gaussian likelihood then remains 
to be determined, which is an intricate issue by itself. 

In most previous studies, the dependence of the covariance 
matrix on cosmological parameters has been ignored when 
writing down the likelihood function. Instead, it was kept fixed 
to some fiducial cosmological model. The dependence of the 
covariance ma trix on the cosmological parameters has been in- 
vestigated in lEifler et al.l (120081) for the case of Gaussian shear 
fields. The authors find that this has a significant effect on the 
constraints on cosmological parameters (reducing the size of 
the credible regions) and will be particularly important for fu- 
ture large-area surveys. 

There are several approache s to determine the cov ariance 
for the fiducial set of parameters : lHoekstra et al. use the 

covariance matrix derived for a Gaussian shear field. Although 
this is rather easy to compute dJoachimi et al.l 120081) . the er- 
rors are strongly underestimated particularly on small scales. 
Anot her option is to estim ate the covariance from the data itself 
(e.g. Massey et al.l2007bl) . This will become sensible and feasi- 
ble mostly for the upcoming large surveys, which can be safely 
split into smaller subfields without severely underestimating 
cosmic variance. A third possibility, which currently seems to 
be the most accurate, is to measure the covari ance matrix from 
a larg e sample of ray-tracing simulations. Semb oloni et al. 
(2007) have provided a fitting formula which allows one 



199 lc IComon et al.lll99ll) . ICA is a technique for the separa- 



to transform covariances computed for Gaussian shear fields 
into covariances including non-Gaussianity. Another promising 
way, which would also easily allow one to take into account the 
dependence on cosmological paramete rs, is the semi-analytical 
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computation using the halo mod el (Scoc cimarro et al. 
ICooray & HubOOlHTakada & JainlEo*09h . 

However, all these works are based on the assumption that 
the likelihood is well approximated by a Gaussian. In this pa- 
per, we study the impact of this assumption on the shape of the 
posterior probability distribution of the matter density parame- 
ter O m and the power spectrum normalisation <x 8 . Furthermore, 
we compute Fisher matrix constraints for the four-dimensional 
parameter space spanned by Q m , cr 8 , /i 100 and Qa- We propose 
a method to numerically compute the likelihood function from 
a large set of ray-tracing simulations based on the technique 
of independent component analysis (ICA, e.g jjutten & Heraultl 



tion of independent source signals underlying a set of observed 
random variables, a statistical method related to factor analysis 
and principal component analysis (PCA). An approach simi- 
lar to ours, called projection pursuit density es timation, which 



we use to verify our results, was proposed by Friedman et al 
dl984l) . 



In their cosmic shear analysis of the combined HST 
GEMS and GOODS da ta of the Chandra Deep Field South, 
Schrabback et al.l (120071) (IS07I from hereon) have found a very 
low value of cr 8 (Q m = 0.3) = 0.52^]^. In the second part of 
this paper, we present a re-analysis of the cosmic shear data of 
S07l Using our estimate of the non-Gaussian likelihood, we in- 
vestigate whether cosmic variance alone is responsible for pro- 
ducing the low cr 8 - estimate or whether the criteria applied by 
Giacconi et al.l (12001 1) to select a field suitable for deep X-ray 
observations have a share in this. 

The outline of our paper is as follows: in Sec. [2] we de- 
scribe our sample of ray-tracing simulations which we use for 
the likelihood estimation. In Sec. [3] we briefly review the lens- 
ing quantities relevant for this paper and Bayesian parameter 
estimation. We introduce our method of estimating the "true" 
likelihood and illustrate the impact of non-Gaussianity on pa- 
rameter estimation using the example of a CDFS-like survey. 
In Sec. H] we present the improved cosmic shear analysis of 
the CDFS and investigate possible reasons for the low power 
spectrum normalisation found in lS07l 

2. Ray-Tracing simulations 

We have performed a set of 10 jV-bod y simulations u sing the 
publically available code GADGET-2 (Spr ingell 120051) . all of 
which are realisations of the same WMAP-5-like cosmology 
(Q m = 0.25, Q A = 0.75, O b = 0.04, n s = 1.0, cr 8 = 0.78, 
/iioo = 0.73). The simulation boxes are Lbox = 150/i U) ' () Mpc 
on a side, populated by A^ p = 256 3 dark matter particles with 
masses of m p = 1 .2 x 10 10 h^L M Q . We have started the simula- 
tions at z = 50 and obtained snapshots from z = to z = 4.5 in 
intervals of Az corresponding to the box size, so that a suitable 
snapshot is available for each lens plane. 

In the following, we only give a brief description of our 
ray-tracing algo rit hm and refer the re ader to, for example, 
Jain et al. ( 2000l) or Hilbert et al. ( 20091) for a more detailed in- 
troduction. 

The ray-tracing is performed by dividing the dark matter 
distribution into redshift slices and projecting each slice onto a 
lens plane. Starting at the observer, light rays are shot through 
this array of lens planes. We assume that deflections only take 
place at the planes themselves, and that the rays propagate on 
straight lines in the space between two planes. In our case, each 
redshift slice corresponds to one output box of the A^-body sim- 
ulation and was projected as a whole onto a lens plane, preserv- 
ing the periodic boundary conditions of the simulation box. To 
avoid repetition of structure along the line of sight, the planes 
were randomly shifted and rotated. The light rays are shot from 
the observer through the set of lens planes, forming a regular 
grid on the first plane. We then use FFT methods to compute 
the lensing potential on each lens plane, from which we obtain 



Hartlap et al.: The non-Gaussianity of the cosmic shear likelihood 



3 



the deflection angle and its partial derivatives on a grid. The 
ray position and the Jacobian of the lens mapping for each ray 
are obtained by recursion: given the ray position on the current 
lens plane, its propagation direction (known from the position 
on the previous plane), and the deflection angle on the current 
plane interpolated onto the ray, we immediately obtain the ray 
position on the next plane. Differentiation of this recursion for- 
mula with respect to the image plane coordinates yields a sim- 
ilar relation for the Jacobian of the lens mapping, which takes 
into account the previously computed tidal deflec tion field (for 



a deta iled description of the formalism used, see Hilb ert et al 
200% . The recursion is performed until we reach the redshift 
cut-off at z = 4.5. 

We obtain the final Jacobian for a given source redshift dis- 
tribution by performing a weighted average over the Jacobians 
for the light paths to each lens plane. Since our aim is to cre- 
ate mock catalogues comparable to those of the CDFS field, 
we use the redshift dis tribution found for our revised galaxy 



catalogues (ISmail et al.lll995l see Sec. l4Tb : 



p(Zs) = A 



exp 



where zo = 1.55, a = 0.59, yS = 1.35 and A is a normal- 
isation constant. This corresponds to a mean source redshift 
of z s = 1.54. We then create the mock source catalogue by 
randomly sampling the resulting shear maps with N s = « S Q 2 
galaxies, where n s = 68 arcmin~ 2 is the number density of 
sources and Q = 0°5 is the side length of the simulated field. In 
total, we have produced 9600 quasi-independent realisations of 
the CDFS field, based on different random shifts and rotations 
of the lens planes and the various jV-body simulations. 

3. The non-Gaussianity of the cosmic shear 
likelihood 

3.1. Cosmic shear 

Perhaps the most common way to extract the lensing informa- 
tion from the measured shapes of distant galaxies is to estimate 
the two-point correlation functions of the distortion field. One 
defines two shear correlation functions (for more details, see 
e.gj 



Schneider 2006) 



US) = <et(&M0 + 0)> ± <6 x (#)ex(0 + 0)> , 



(1) 



where e t , x are the tangential and cross components of the mea- 
sured ellipticity relative to the line connecting the two galaxies, 
and 9 is the angular separation. An unbiased estimator for the 
shear correlat ion functions for a ran dom distribution of galax- 
ies is given in lSchneider et al.l (120021) : 

U0) = j^r J) {euejt ± e ix e jx ) A e (|^ - #;|) . (2) 

P 'J 

Here, i and j label galaxies at angular positions and &j, re- 
spectively. The function Ag(0) is 1 if falls into the angular 
separation bin centred on 0, and is zero otherwise. Finally, N p 
is the number of pairs of galaxies in the bin under considera- 
tion. 



3.2. Parameter estimation 

Let us assume that we have measured the shear correlation 
functions ^±(6f) onp/2 angular separation bins 8, and now wish 
to infer some parameters n of our model m(ii) for ^±(Q\). For 
what follows, we define the joint data vector £ = f _)', 
which in total is supposed to have p entries. 

Adopting a Bayesian point of view, our aim is to compute 
the posterior likelihood, i.e. the probability distribution of a pa- 
rameter vector 7r given the information provided by the data 

f: 



Pit) 



Pirn . 



(3) 



Here, p(n) is the prior distribution of the parameters, which 
incorporates our knowledge about n prior to looking at the data; 
such can originate from previous measurements or theoretical 
arguments. The evidence p{i~) in this context simply serves as 
a normalisation factor. Hitherto, it has been assumed that the 
likelihood p{^\n) is a Gaussian distribution: 



pirn = 



i 



(2n)Pl 2 detC(Tr) 1 / 2 



1 



(4) 



x exp m{n)] x C"V) \g - m(n)] J , 



where C(tt) is the covariance matrix of £, as predicted by the 
underlying model. Usually, however, the dependence of the co- 
variance matrix upon cosmological parameters is not taken into 
account. Rather, the covariance that is computed for a fixed 
fiducial set of parameters no is used in Eq. (|4). Under this 
approximation, the likelihood is a function of the difference 
A(7r) = £ - m(7i) only: 



= L no [A(7T)] 



3.3. Estimating the likelihood 



(5) 



The choice of the functional form of the likelihood as given by 
Eq. © is only approximate. Since the underlying shear field in 
the correlation function measurement becomes non-Gaussian 
in particular on small scales due to nonlinear structure forma- 
tion, there is no good reason to expect the distribution of the 
shear correlation function to be Gaussian. Our aim therefore is 
to use a very large sample of ray-tracing simulations to estimate 
the likelihood and explore the effects of the deviations from a 
Gaussian shape on cosmological parameter constraints. 

In this work, we have to sustain the approximation that the 
functional form of the likelihood does not depend on cosmol- 
ogy in order to keep computation time manageable. Our ray- 
tracing simulations were all done for identical cosmological 
parameters, which is our fiducial parameter vector 7Tq. Thus, as 
in Eq. © the likelihood depends on cosmology only through 
the difference A(tt) = £ - m{n). 

Since L m is the probability of obtaining the data £ given 
the parameters no, we in principle have to estimate the p- 
dimensional distribution of £ from our sample of ray-tracing 
simulations. However, due to the high dimensionality of the 
problem, a brute force approach to estimate the full joint distri- 
bution is hopeless. The problem would simplify considerably 
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if we could find a transformation 



such that 



s=/[A(w)] , 



p s (s\n ) = Y\ PtiiSifro) ■ 



(6) 



(7) 



Here, / is in general a mapping from W to R" IC («ic < p) and 
s G R" IC is our new data vector. This would reduce the problem 
to estimating ri\c one-dimensional probability distributions in- 
stead of a single p-dimensional one. Eq. © is equivalent to the 
statement that we are looking for a new set of basis vectors of 
R" IC in which the components s, of the shear correlation func- 
tion are statistically independent. It is virtually impossible to 
find the (in general nonlinear) mapping /. However, it is possi- 
ble to make progress if we make the ansatz that / is linear: 



s = AA(tt) 



(8) 



where A e R" IcX '' is the transformation or "un-mixing" matrix. 

Our likelihood estimation procedure is as follows: the first 
step is to remove first-ord er correlations fro m the data vector 
by performing a PCA (e.g. IPress et al.ll 19921) . This yields a ba- 
sis in which the components of £ are uncorrected. If we knew 
that the distribution of £ were Gaussian, this would be suffi- 
cient, because in this case uncorrelatedness is equivalent to sta- 
tistical independence. However, for a general distribution, un- 
correlatedness is only a necessary condition for independence. 
Since we suspect that the likelihood is non-Gaussian, a second 
change of basis, determined by the ICA technique (described 
in detail in the next section), is carried out which then results in 
the desir ed independence. We then use a kernel densit y method 
(see e.g. lHastie et aL 2001 : Venables & Ripley 2002 , and ref- 
erences therein) to estimate and tabulate the one-dimensional 
distributions p s ,(Si\no) m this new basis. The density estimate 
is constructed by smoothing the empirical distribution function 
of the observations of si, 



1 N 



(9) 



where s- is the y'-th of yV observations of s, and 6u is the Dirac 
delta-function, with a smooth kernel K. The estimate p s . of the 
desired density p s . then is given by 



PsM) 



1 N 

— Vk 

Nb 4-f 

7=1 



(10) 



where is the y'-th of N observations of s; and b is the 
bandwidth. For the kernel K we use a Gaussian distribution. 
It has been shown that the shape of the Kernel K is of sec- 
ondary importance for the quality of the density estimate; much 
more important is the choice of the bandwidth b. If b is too 
small, p s . is essentially unbiased, but tends to have a high vari- 
ance because the noise is not properly smoothed out. On the 
other hand, choosing a bandwidth that is too large results in 
a smooth estimate with low variance, but a higher bias, be- 
cause real small scale features of the probability density are 



smeared out. Ou r choice of the bandwidth is based on the "rul e 
of thumb" (e.g. ISilverman|[l986t IScottll 19921: lDavisonll2003l) : 
b = 0.9min(cx, R/1.34)N~ 1/5 . Here, & is the sample standard 
deviation and R is the inter-quartile range of the sample. 

Constraints on cosmological parameters can now be de- 
rived as follows: we transform our set of model vectors and 
the measured correlation function to the new ICA basis: 



m{n) = A m(7i) 



(11) 
(12) 



so that s = g — m(7r). The ICA posterior distribution is then 
given by 

"IC 

p(n\g) oc p{n) Y\ p Si (ii - m(n)\7ro) . (13) 



3.4. Independent Component Analysis 
We now briefly outline t he ICA method dHvvarinen et al .12001 



Hvvarinen &"oiall2000h . which we use to find the new basis in 
R"ic in which the components of A are (approximately) statisti- 
cally independent. ICA is best introduced by assuming that the 
data at hand were generated by the following linear model: 



A = Ms 



(14) 



where s is a vector of statistically independent source signals 
with non-Gaussian probability distributions and M is the mix- 
ing matrix. For simplicity, we will from now on only consider 
the case «ic = p, in which case the mixing matrix M is simply 
the inverse of the un-mixing matrix A in Eq. ®. The goal of 
ICA is to estimate both M and s from the data. 

An intuitive, though slightly hand-waving way to under- 
stand how ICA works is to note that a set of linear combina- 
tions F, of independent, non-Gaussian random variables Xj will 
usually have distributions that are more Gaussian than the orig- 
inal distributions of the Xj (Central Limit Theorem). Reversing 
this argument, this suggests that the Xj could be recovered 
from a sample of the F,- by looking for linear combinations 
of the F, that have the least Gaussian distributions. These lin- 
ear combinations will also be close to statistically independent. 
A more rigorou s just ification of the method can be found in 
Hvvarinen etalJd200 lb . 

The ICA algorithm consists of two parts, the first of which 
is a preprocessing step: after subtracting the mean A = (A) 
from A, the data is whitened, i.e. a linear transformation A = 
L(A - A) is introduced such that (AA ) = E, where E is the unit 
matrix. This can be achieved by the eigen-decomposition of the 
co variance matrix C — UDIF of A, where D — diag(iii, . . . , d p ), 
by choosing L — D^'^U 1 . Note that U is orthonormal and that 
di > for all i. As will be discussed below, each source signal 
Si can only be determined up to a multiplicative constant using 
ICA. We choose these factors such that (ss l ) = E. The effect of 
the whitening is that the new mixing matrix M = LM between 
A and s is orthogonal. This can be seen as follows: E = (AA ) = 
IVKm^M'. Since we have chosen {ss 1 } = E, the claim follows. 

After the preprocessing, the components of A are uncorre- 
lated. This would be equivalent to statistical independence if 
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their distributions were Gaussian. However, as this is not the 
case here, a further step is needed. It consists of finding a new 
set of orthogonal vectors w, (the row vectors of M) such that 
the distributions p Zi {zi) of 



O - 

o 



• Wi 



maximise a suitable measure of non-Gaussianity. A common 
method to achieve this is to minimise the entropy (or approxi- 
mations thereof) of the zu which is defined by 



I 



&y Pzi(y)i°EPzi(y) ■ 



(16) 



Since it can be shown that the Gaussian distribution has the 
largest entropy of all distributions of equal variance, this can 
be rewritten as maximising the so-called negentropy of the Zi, 
defined by 

J z ,=H^m-H Z) . (17) 

Here, zp auss is a Gaussian random variable with the same vari- 
ance as Zi and J (zd > 0. Starting from randomly chosen initial 
directions Wu the algorithm tries to maximise J {zi) iteratively 
(in practice, it is sufficient to use a simple approximation to the 
negentropy). For more details, the reader is again referred to 
Hvvarinen et al.1 (1200 II) . 

ICA suffers from several ambiguities, none of which, how- 
ever, is crucial for this work. First of all, the amplitudes of the 
source signals cannot be determined, since any prefactor A to 
the signal s, can be cancelled by multiplication of the corre- 
sponding column of the mixing matrix by I /A. Secondly, the 
order of the independent components is not determined, since 
any permutation of the s, can be accommodated by correspond- 
ing changes to M. Thirdly, ICA does not yield a unique answer 
if at least some of the s, are Gaussian - the subset of Gaussian 
signals is only determined up to an orthogonal transformation. 
This is not an issue in our context, since the Gaussian sig- 
nals will be uncorrected thanks to the preprocessing steps, and 
uncorrelatedness implies statistical independence for Gaussian 
random variables. 

Several interpretations of ICA and algor it hms exist 
and are described in detail in Hvvarine n et al. d200ll) . In 
this work, we use an imp l ement ation of the fast ICA 
algorithm dHvvarinen & Ojal [l9 97) for the R language 
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3.5. Tests 

In this section, we present the results of a number of tests we 
have performed to insure that our results are not affected by 
convergence issues or statistical biases of any kind. 

The fastICA algorithm requires a set of randomly cho- 
sen directions w, as initial conditions. It then iteratively com- 
putes corrections to these vectors in order to increase the negen- 
tropy of the projections of the data vectors onto these directions 
(Eq.[T3]), followed by an orthonormalisation step. It is not clear 
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Fig. 1. Area of the 68% (dashed lines) and 99% (solid lines) 
credible regions in the Q m -crg-plane as function of the sam- 
ple size N, for the Gaussian likelihood (red, upper curves) and 
the likelihood computed using the ICA algorithm (black, lower 
curves). Blue lines are the predicted areas based on Eq. ( fT8l . 

a priori whether the algorithm will settle in the same negen- 
tropy maxima for different sets of initial vectors. This concern 
is backed by the fact that at least some of the p Sj ( s i\ 7r o) might 
be very close to Gaussian, which might hamper convergence 
even further. We have therefore tested whether we obtain the 
same set of basis vectors from a large number of different ini- 
tial conditions. We find that this is indeed the case for those 
Wi for which the distribution of p Zi (Zi = A ■ wi) departs signifi- 
cantly from a Gaussian. As expected, the directions leading to 
a rather Gaussian p Zi are different for different starting values, 
reflecting the inability of ICA to distinguish between Gaussian 
source signals. However, the posterior distributions derived us- 
ing our algorithm do not differ notably when using different 
initial conditions. This is even true if the fastICA algorithm 
does not formally converge (i.e. when the differences of some 
of the basis vectors between two iterations is not small): after a 
few hundred iterations, the non-Gaussian directions are deter- 
mined and do not change anymore. The reason for not reaching 
convergence is that the algorithm still tries to find negentropy 
maxima in the subspace of Gaussian d i rectio ns. 

As has been noted in Hartlap et al.l d2007h . statistical biases 
can become significant already for the Gaussian approximation 
of the likelihood (Eq. |4j: care has to be taken if the covariance 
matrix of the correlation function (given on p bins) is estimated 
from a finite set of N simulations or observations. Inverting the 
estimated covariance yields a biased estimate of the inverse: 

(er')= N ~ l r 1 forp<iv-i, (is) 



1 http://www.r-project.org/ 



where C is the estimated and I the true covariance matrix. This 
bias leads to an underestimation of the size of credible regions 
by a factor of (N - p - 2)/(N - 1) * 1 - p/N. We suspect that 
a similar bias occurs in our likelihood estimation procedure. In 
Fig.Q] we therefore plot the area of the 68% and 99% credible 
regions of the posterior distribution for Q m and erg (keeping 
all other cosmological parameters fixed to their fiducial values) 
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as functions of the number N of observations of the correla- 
tion functions used to estimate the ICA transformation (black 
curves). To exclude noise effects from the analysis, we use the 
theoretical prediction of the correlation function for the fidu- 
cial cosmological parameters as data vector. We set p - 30 
throughout. For comparison, we also show the areas computed 
using the Gaussian likelihood (red curves). In the latter case, 
the bias predicted by Eq. (fl~8T > is clearly visible as a decrease 
of the area when N becomes small. The ICA method suffers 
from a similar bias, although the behaviour at small N seems 
to be slightly different. More important, though, is the fact 
that this bias is unimportant for reasonably large sample sizes 
(N > 2000). Since we always use the full sample (N = 9600) 
in the following, this bias is completely negligible. 

Our method to estimate the likelihood crucially depends on 
the assumption that a linear transformation makes the com- 
ponents of the shear correlation vectors statistically indepen- 
dent. A necessary condition for mutual statistical indepen- 
dence of all Si is pairwise independence. The components i 
and j are called pairwise statistical independent if /?(«,, Sj) = 
PsXsd Psj(sj)- We therefore compare the joint pairwise dis- 
tributions p(si,sj) to the product distributions p Si (sd p Sj (sj), 
where we estimate p(sj, sj) using a two-dimensional exten- 
sion (using a bi-variate Gaussian kernel) of the kernel density 
method given by Eq. ( TTOb . We give two examples in Fig. [2] 
where we compare the joint and product distributions of the 
two most-non-Gaussian components and two nearly Gaussian 
components. As expected, a simple PC A is not enough to 
achieve pairwise statistical independence in the non-Gaussian 
case. Only after performing the ICA, pairwise independence is 
achieved. 

A more rigorous test for mutual statistical indepe ndence for 



Components 1-2 

ICA PCA 



the mu ltivariate, continuous case was proposed by IChiu et al 
d2003l) . t is based on the observation that if x is a continuous 



random variable and P(x) is its cumulative distribution function 
(CDF), then z = P(x) is uniformly distributed in [0,1]. If we are 
given a set of statistically independent random variables s,, this 
means that the joint distribution of Zi = Pi(si), where again P, 
is the CDF of s,, is uniform in the multidimensional unit cube. 
On the other hand, if the assumption of statistical independence 
of the Si is violated, the joint density p z of the z, is given by 

Pz(z) = p z [Pi(ii),...,P„(i„)] 

dz 1 



p s (si,.,.,s„) 
p s (s u ...,s n ) 

n*i />.-(*) ' 



ds 





Components 9-10 

ICA PCA 



(19) 



Fig. 2. Comparison of the joint distributions p(s,-, sj) (black 
dashed contours) and the product p Si ( s d Ps (solid red con- 
tours) for the two most non-Gaussian components (z = 1, 
j - 2) and two rather Gaussian ones (z — 9, j — 10), af- 
ter performing ICA (left panels) and PCA (right panels). The 
components have been ranked and labelled according to their 
non-Gaussianity; the z'-th PCA component is in general not 
the same as z'-th ICA-component. In the right panel of each 
plot, the distributions with respect to the PCA basis vectors 
are shown and in the left panel, the distributions in the ICA 
basis are displayed. Statistical independence is indicated by 
p(Si,Sj) = p St (Si)p S} (Sj). 



Here, /?,(«,) is the distribution function of Si only and p s is the 
joint distribution function of S\,...,s n . This means that the 
joint distribution of the Zi is not uniform if the s, are statis- 
tically dependent. Therefore, we can test if the s, we obtain 
from the ICA procedure are indeed independent by comput- 
ing their empirical cumulative distribution functions, carrying 
out the above transformation and finally testing for multiva riate 
uniformity. Such a test was described in lLiang et al] (1200 lb . to 
which we refer the reader for more details. 



Applying the test to the 5, that we have obtained from our 
ICA procedure, we have to reject statistical independence at 
99% confidence. This means that the ICA does not remove all 
dependencies between the components of the shear correlation 
function. This result, however, does not give an indication of 
how these residual dependencies affect our likelihood estimate 
and the conclusions regarding constraints on cosmological pa- 
rameters. We therefore compare the constraints derived from 
the ICA likelihood with the constraints from the likelihood 
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Fig. 4. Comparison of the posterior likeli- 
hoods for (fl m , cr 8 ), computed using the ICA 
likelihood (left panel) and the Gaussian ap- 
proximation (right panel). Shown are the 
contours of the 68%, 95% and 99% credible 
regions. The maximum of the ICA posterior 
is denoted by 'x' ; the maximum of the pos- 
terior based on the Gaussian likelihood co- 
incides with the fiducial parameter set and is 
marked by the symbol 'o'. 
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Fig. 3. Comparison of the posterior likelihoods for (£2 m ,<rg), 
computed using the ICA likelihood (black contours) and the 
PPDE likelihood (red contours). Shown are the contours of the 
68%, 95% and 99% credible regions. 



estimated using an alternative m ethod, called project ion pur- 
suit density estimation (PPDE; Friedman et al.1 1 1984b . which 
we describe in detail in App.lAl This method is free from any 
assumptions regarding statistical independence and therefore 
provides an an ideal cross-check for the ICA method. For the 
comparison, we have computed the shear correlation functions 
with p = 10, and we also use «ic = 10 independent compo- 
nents. The resulting contours in the £2 m -<x 8 -plane are shown in 
Fig. [3] Both posterior likelihoods are very similar, although the 
credible regions of the PPDE posterior have a slightly smaller 
area than the contours of the ICA posterior (which actually sup- 
ports the findings presented in the next section). Given the good 
agreement of the two methods, we will henceforth only make 
use of the ICA procedure, which is considerably faster and nu- 
merically less contrived than PPDE. 



3.6. Results on the posterior 

The most interesting question is how much the posterior dis- 
tribution computed from the non-Gaussian ICA likelihood will 
differ from the Gaussian approximation. We have investigated 
this for the case of the CDFS and the parameter set (£2 m , o~s). 
Here and henceforth, we use 15 angular bins for and in 
the range from 12" to 30', i.e. p = 30. For the data vector, 
we do not use the correlation functions from our simulations, 
but take the theoretical prediction for our fiducial parameter set 
instead. This allows us to study the shape of the posterior likeli- 
hood independent of noise in the data and biases due to the fact 
that the theoretical model does not quite match the mean cor- 
relation function from the simulations. In Fig. |U we show the 
contours of the posterior computed in this way from the like- 
lihood estimated using our ICA method (left panel) and from 
the Gaussian likelihood. We have assumed cr\ € \ = 0.45 for the 
dispersion of the intrinsic galaxy ellipticities. The shape of the 
ICA posterior is different from that of the Gaussian approxima- 
tion in three respects: it is steeper (leading to smaller credible 
regions), the maximum is shifted towards higher <x 8 and lower 
Q m , and the contours are slightly tilted. The first two differ- 
ences can be traced back to the shape of the distributions of 
the individual ICA components: most of the distribution func- 
tions p Sj are generally slightly steeper than a Gaussian and 
most of the non-Gaussian components are in addition strongly 
skewed, thus shifting the peak of the posterior. Generally, these 
differences are more pronounced in the direction of the £2 m - 
crg -degeneracy and towards lower values of both parameters, 
where the posterior is shallower. 

Of more practical relevance is how the parameter con- 
straints change when the ICA likelihood is used for the anal- 
ysis of large weak lensing surveys. Here, we consider surveys 
consisting of Nf CDFS-like fields. Bayesian theory states that 
if Nf is large enough, the posterior probability distribution of 
the parameters becomes Gaussian, centred on th e true parame 
ter values, with covariance matrix (Nf F) -1 (e.g. Gelman et al 



120041) . Here, F is the Fisher matrix (IKendall et al.1 19871) . which 
is defined by 



'a/3 



dlogLd log L 



(20) 
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Fig. 5. Fisher matrix constraints for a hy- 
pothetical 1500-deg" survey, consisting of 
6000 CDFS-like fields. The plots on the 
diagonal show the ID marginals, the off- 
diagonal plots the 2D marginals derived 
from the full 4D posterior. The red dashed 
(black solid) lines/contours have been com- 
puted using the Fisher matrix of the 
Gaussian likelihood (the ICA likelihood). 



where (■) denotes the expectation value with respect to the like- 
lihood function. If the likelihood is Gaussian and if the covari- 
ance matrix C does not depend on cosmology, one can show 
that 

, . dm;0r) dmj{n) 



dn a dn a 



(21) 



Eq. ( l20t provides us with a way to estimate the Fisher ma- 
trix for the non-Gaussian likelihood. For each ray-tracing re- 
alisation of the CDFS, we compute the logarithm of the pos- 
terior distribution log p{n\^) and its derivatives with respect to 
the cosmological parameters at the fiducial parameter values. 
Since we use uniform priors for all cosmological parameters, 
the derivatives of the log-posterior are identical to those of the 
log-likelihood. We can then compute the Fisher matrix by av- 
eraging over all realisations: 



1 N 

k=l 



dlogp(7t\g) dlog/j(7T|£) 



(22) 



In App.|B] we show that the expression for the Fisher matrix of 
the ICA likelihood can be evaluated further to be 



F ff/3 - Yj 



dth, drhj 



J dsi p Sl (Si) | 



d\ogp Si (sd 



This equation allows a simpler, alternative computation of F 
from the estimated p Si (Sj), as discussed in App.lBl 

We have used Eqns. (f2Tb and ( f23T > to compute the Fisher 
matrices for a 1500-deg 2 survey (Nf = 6000). We fit for four 
cosmological parameters (Q m , cr 8 , /i 1()0 , Q A ), keeping all other 
parameters fixed to their true values. To visualise the posterior, 
we compute two-dimensional marginalised posterior distribu- 
tions for each parameter pair as well as the one-dimensional 
marginals for each parameter. The results are shown in Fig. [5] 



A general feature of the ICA likelihood, which has already 
been apparent in the 2D-analysis (Fig. 5), is that the credible 
intervals are significantly smaller than the ones derived from 
the Gaussian likelihood. For the two-dimensional marginal dis- 
tributions, the area of the 68% credible regions derived from 
the ICA likelihood are smaller by « 30 - 40%. The one- 
dimensional constraints are tighter by « 10 - 25%. In addition 
we find that the ICA Fisher ellipses in some cases are slightly 
tilted with respect to those computed using the Gaussian likeli- 
hood. This is particularly apparent for parameter combinations 
involving the Hubble parameter. Note that the shift of the maxi- 
mum observed in the two-dimensional case for a single CDFS- 
like field is absent here because it was assumed for the Fisher 
analysis that the posterior is centred on the true parameter val- 
ues. 



4. How odd is the Chandra Deep Field South? 

4.1. The CDFS cosmic shear data 

The second part of this work is based on the cosmologi 
cal weak lensing analysis o f the combined HST GEMS and 
GOO DS data of the CDFS dRix et alJbOoH iGiavalisco etaT 



|2004|) . which was presented in lS07i The mosaic comprises 78 
ACS/WFC tiles imaged in F606W, covering a total area of 
~ 28' x 28'. We refer the reader to the original publication for 
details on the data and weak lensing analysis, which applies the 
KSB+ formalism dKaiser et alJ ll995t iLuppino & Kaiserlll997 ; 



Hoekstraet aill998l) 

In ls07l the cosmic shear analysis was performed using two 
different signal-to-noise and magnitude cuts. The first one se- 
lects galaxies with S/N > 4 and has no magnitude cut, and 
the second one applies a more conservative selection with 
S/N > 5 and ma)6 < 27.0, where S/N is the shear measure- 
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ment signal-to-noise ratio as defined in lErben et al.l (1200 lb . 
The drizzling process in the data reduction introduces cor- 
related noise in adjacent pixels. While these correlations are 
ignored in the computation of S/N, an approximate correc 
tion factor (see |S07|) is taken into account for S /N true 
ing the above cuts S/N true > 1 .9 and S/N true 



mak- 

> 2.4 respectively. 
The two selection criteria yielded moderately different cr 8 - 



estimates of 



and 0.59+" !' for Q. m = 0.3 (median of 



the posterior), not assuming a flat Universe. The errors in- 
clude the statistical and redshift uncertainties. This translates 
to crg = 0.57+q [^ and 0.65 ^ for our fiducial cosmology 
with Q m = 0.25. The difference of the two estimates was con- 
sidered as a measure for the robustness and hence systematic 
accuracy of our shear measurement pipeline. While the analy- 
sis of the "Shear TEsting Prog ramme 2" (STEP2) image sim- 
ulations dMassev et al.l l2007al) indicated no significant aver- 
age shear calibration bias for our method, a detected depen- 
dence on galaxy magnitude and size could effectively bias a 
cosmic shear analysis through the redshift d ependence of the 
shear signal (see also ISemboloniet"aT] E)08). In order to bet- 
ter understand the difference between the two estimates found 



m 



S07L and to exclude any remaining calibration uncertainty 
in the current analysis, we further investigate the shear recov- 
ery accuracy as a function of the signal-to-noise ratio using 
the STEP2 simulations in Appendix [C] Here we conclude that 
our KSB+ implementation under-estimates gravitational shear 
for very noisy galaxies with S/ N true < 2.5, which likely ex- 



plains the lower signal found in S07 when all galaxies with 
S/N > 4 (S /N true > 1 .9) were considered. For the more conser- 
vative selection criteria we find no significant mean shear cal- 
ibration bias and a variation as a function of magnitude and 
size of < +5%. Therefore we base our current analysis on 
the more robust galaxy sample with S/N > 5 (S/N tlue > 2.4) 
and m^oe < 27.0, which yields a galaxy number density of 
68 arcmin~ 2 . Based on the simulations, any remaining calibra- 
tion uncertainty should be negligible compared to the statistical 
uncertainty. 



Note that [Hevmans et al.l (120051) found a higher estimate 
of cr 8 (£2 m /0.3) a65 = 0.68 + 0.13 from GEMS, where they ex- 
trapolated the redshift distribution fr om the relatively shallow 
COMBO- 17 photometric redshifts JWolf et all Eo04l). Using 
deepe r data from the GOODS-MUSIC sample (iGrazian et al.l 
20061) . IS07l were able to show that the COMBO- 17 extrapola- 
tion significantly underestimates the mean redshift for GEMS, 
leading to the difference in the results for erg. 

In Fig. [6] we show the posterior distribution for crg based 
on this sample of galaxies. For the fit, all other cosmological 
parameters were held fixed at the fiducial values chosen for 
our ray-tracing simulations. This avoids complications in the 
discussion of cosmic variance and field selection biases due to 
the effect of parameter degeneracies. We choose a flat prior for 
crg, with a lower boundary of crg , m i n = 0.35 to cut off the tail 
of the posterior distribution towards small values of the power 
spectrum normalisation, which is caused by the fact that the 
difference (and therefore the likelihood) between the data and 
the model vectors changes only very little when crg (and there- 
fore the shear correlation function) is very small. We have per- 
formed the fit for the ICA likelihood as well as for the Gaussian 
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Fig. 6. Posterior distributions for crg as computed from the 
CDFS data. The black solid line corresponds to the ICA likeli- 
hood, the red dashed line is from the Gaussian likelihood whose 
covariance matrix was estimated from the ray-tracing simula- 
tions. The blue dotted line was computed from the Gaussian 
likelihood with an analytically computed covariance matrix, 
assuming that the shear field is Gaussian. The similarity of the 
posterior densities derived from the ICA likelihood and using 
the Gaussian covariance matrix is purely coincidental, occur- 
ring only for this particular data vector. 

Table 1. Estimates of cr 8 from the CDFS 





ICA likel. 


Gaussian likel. 
(ray-tracing cov.) 


Gaussian likel. 
(Gaussian cov.) 


MAP 
Median 


0.68+°- 
0.623;!! 


59+ 010 
57+°- 15 

"-"-0.15 


0.68+°;!? 
0.64+°;}° 



approximation to the likelihood. For the latter, the covariance 
matrix was in one case estimated from the full sample of our 
ray-tracing simulations, and in the other case computed ana- 
lytica lly assuming that the shear field is a Gaussian random 
field (Ijoachimi et al.ll2008l) . The striking similarity of the pos- 
terior densities derived from the ICA likelihood and using the 
Gaussian covariance matrix for this particular data vector is 
merely a coincidence and is in general not seen for our set of 
simulated correlation functions. 

For estimates of crg, we use the maximum of the posterior 
(henceforth we write ICA-MAP for the maximum of the non- 
Gaussian likelihood, and Gauss-MAP if the Gaussian approxi- 
mation is used), although we al so quote the median (ICA me- 
dian) for comparison with IS071 In the first case, our credible 
intervals are highest posterior density intervals, whereas for the 
median we choose to report the interval for which the probabil- 
ity of crg of being below the lower interval boundary is as high 
as being above the upper boundary. The results are summarised 
inTab.Q] 



4.2. Cosmic Variance 

The original estimates for crg given in |S07| and those found 
in the previous section for the Gaussian likelihood are rather 
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Fig. 7. Sampling distributions of the MAP estimators of cr 8 , de- 
rived from 9600 realisations of the CDFS. All other parameters 
were held fixed at their fiducial values for the fit. The histogram 
with red dashed lines has been obtained from the Gaussian like- 
lihood, the one with solid lines from the ICA likelihood. Also 
shown are the best fitting Gaussian distributions. We indicate 
the fiducial value of cr$ and our estimates from the CDFS with 
vertical lines. 



low c ompared to the value reported by WMAP5 dDunklev et al 
2009). This problem appears less severe when the full non- 
Gaussian likelihood is used, but the erg -estimate is still rather 
low. It is therefore interesting to know whether this can be fully 
attributed to cosmic variance or whether the way in which the 
CDFS was originally selected biases our estimates low. 

To begin, we determine the probability of finding a low <x 8 
in a CDFS-like field if the pointing is completely random. We 
estimate the sampling distribution of the cr 8 -MAP estimators 
for Gaussian and ICA likelihoods from the full sample of our 
ray-tracing simulations. We compute the posterior likelihood 
for erg using a uniform prior in the range <x 8 e [0.35; 1.8] and 
determine the MAP estimator <r 8 . As in the previous sections, 
we do this using both the Gaussian and the ICA likelihoods. 
To separate possible biases of the estimators fro m biases that 
migh t arise because the model prediction based on lSmith et al.l 
(2003) does not quite fit our simulations, we correct the simu- 
lated correlation functions for this: if is the correlation func- 
tion measured in the 2-th realisation, then 



e(0 



<£> + OT(7T ) 



(24) 



is the "re-centred" shear correlation, where (£) is the mean of 
all realisations and m(no) is our fiducial model. 

The resulting sampling distributions of erg are shown in 
Figs. [7] (original and [8] (re-centred £). All the distributions 
are well fit by a Gaussian. With the original correlation func- 
tions, we obtain estimates <x 8 which are too high on average. 
This reflects the fa ct that the power spectrum fitting formula by 
Smith et al.l d2003l) u nderpredicts the s mall scale power in the 
simulations (see also Hil bert et al . 2009). If we correct for this, 
we see that the maximum of the ICA likelihood is a nearly un- 
biased estimator of cr 8 in the one-dimensional case considered 
here, and in addition has a lower variance than the maximum 
of the Gaussian likelihood. 



Gauss MAP 

ICA MAP 

re-centered 




Fig. 8. Same as Fig. [7J but using re-centred correlation func- 
tions 

Table 2. Prob(cr 8 < fr% DFS ) for the CDFS 



Gauss 


Gauss 


ICA 


ICA 


(MAP) 


(median) 


(MAP) 


(median) 


re-centred CF 6.8% 


8.6% 


12.9% 


9.0% 


original CF 1.8% 


3.0% 


5.4% 


3.4% 



We estimate the probability of obtaining a power spectrum 
normalisation as low as the one measured in the CDFS or lower, 
Prob(<x 8 < o" 8 DFS ), by the ratio of the number of realisations 
which fulfil this condition to the total number of simulations. 
These estimates agree very well with those computed from the 
best fitting Gaussian distribution. The results for the MAP and 
median estimators are summarised in Tab. [2] As expected from 
the above considerations, we find higher probabilities for the 
re-centred correlation functions. In this case, the ICA-MAP es- 
timator yields 13% for the probability of obtaining an equally 
low or lower <x 8 than the CDFS. This reduces to w 5% when the 
uncorrected correlation functions are used, because the misfit 
of our theoretical correlation functions to the simulations bi- 
ases the cr 8 -estimates high. If we assume that our simulations 
are a reasonable representation of the real Universe, we can ex- 
pect the same bias when we perform fits to real data. Therefore, 
Prob(<x 8 < <r 8 DFS ) « 0.05 as derived from the uncorrected cor- 
relation functions is most likely closest to reality. The proba- 
bilities computed from the Gauss-MAP estimates are generally 
smaller than the ICA-MAP values because of the lower value of 
a-g DFS found using these estimators, even though the sampling 
distributions of the Gauss estimators are broader. 



4.3. Influence of the CDFS selection criteria 

We now investigate if and by how much the way in which the 
CDFS was selected can bias our estimates of the power spec- 
trum normalisation low. Several local criteria had to be fulfilled 
by the future CDFS, such as a low galactic HI density, the ab- 
sence of bright stars and observability from certain observatory 
sites. Since these conditions do not reach beyond our galaxy, 
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Fig. 9. The average values of the ICA-MAP (solid black line) 
and Gauss-MAP (solid red line) estimators computed from 
CDFS realisations that do not contain clusters with an X-ray 
flux larger than F\[ m . For comparison, we also plot the averages 
of the corresponding median estimators (dashed lines). 



we do not expect them to affect the lensing signal by the cos- 
mological large-scale structure. 

Furthermore, the field was chosen such that no extended 
X-ray sources from the ROSAT All-Sky Survey (RASS), in 
particular galaxy clusters, are in the field of view. This is po- 
tentially important, since it is known from halo-model calcula- 
tions that the cosmic shear power spectrum on intermediate and 
small scales is dominated by group- and cluster-sized haloes. 
Therefore, the exclusion of X-ray clusters might bias the se- 
lection of a suitable line of sight towards under-dense fields. 
On the other hand, the RASS is quite shallow and thus only 
contains very luminous or nearby clusters, which have a lim- 
ited impact on the lensing signal due to their low number or 
low lensing efficiency. We quantify the importance of this cri- 
terion using the halo catalogues of our -body-simulations. To 
each halo, we assign an X-ray luminosity in the energy range 
from 0. 1 to 2.4 keV using the mass-luminosity relation given in 
Reiprich & Bohringer (2002) and convert this into X-ray flux 
using the halo redshift. We then compute the average of the <xg 
estimates from all fields which do not contain a cluster brighter 
than a certain flux limit. It is difficult to define an exact overall 
flux limit to describe the CDFS selection, because the RASS 
is rather heterogeneous. However, it is apparent from Fig. [9] 
that even a very conservative limit of 10 -13 ergs/sec/cm 2 will 
change the average <x 8 estimate by at most 3 - 5%. This bias 
is therefore most likely not large enough to explain our CDFS 
result alone. 

Finally, the CDFS candidate should not contain any "rele- 
vant NED source". This is very hard to translate into a quanti- 
tative criterion, in particular because our simulations contain 
only dark matter. We model the effect of imposing this re- 
quirement by demanding that there be less than «haio group- 
or cluster-sized haloes (M > 10 13 M B /h) in the redshift range 
from z — and z = 0.5 in a CDFS candidate. The impact of 
this criterion on the estimated value of frg using the ICA- and 
Gauss-MAP estimators is shown in Fig. |T0j As expected, the 



all <4 <8 <12 <16 <20 

Number of halos with z < 0.5, M>10 13 M Sun /h 

Fig. 10. Dependence of the ICA-MAP-estimator for cr 8 on the 
number of group- and cluster-sized haloes «haio between z — 
and z = 0.5. For each «haio-bin, we summarise the distribution 
of the corresponding subsample of simulated CDFS-fields by 
giving a box plot: the thick horizontal line in each box denotes 
the median, the upper and lower box boundaries give the up- 
per and lower quartiles of the distribution of the sample values. 
The error bars ("whiskers") extend to the 10% and 90% quan- 
tiles, respectively. To visualise the tails of the distributions, the 
most extreme values are given as points. The width of each 
box is proportional to the square root of the sample size. For 
comparison, we also show for each subsample the median of 
the Gauss MAP estimators as red crosses. The solid black hor- 
izontal line indicates the true value of cr 8 , the black dashed line 
the ICA-MAP estimate for the CDFS and the red dotted line 
the Gauss-MAP estimate. The average number of haloes with 
M > 10 13 M and z < 0.5 in a CDFS-like field is n ha i = 18.5. 



median cr 8 is a mono tonic ally increasing function of «haio- For 
fields with less than as 12 massive haloes, the probability of ob- 
taining a power spectrum normalisation as low as in the CDFS 
rises above m 20%. Given that the average number of massive 
haloes in the specified redshift range is 18.5, it does not seem to 
be too unreasonable that fields with less than * 12 such haloes 
could be obtained by selecting "empty" regions in th e NED . 
This is also in qualitative agreement with lPhleps et alJ (120071) . 
who find that the CDFS is underdense by a factor of w 2 in the 
redshift range from z ~ 0.2 to z ~ 0.4. 

We estimate the impact of this selection criterion on the 
estimates of cosmological parameters by treating the number 
of haloes in the CDFS as a nuisance parameter in the pro- 
cess of parameter estimation. Similar to what we did to ob- 
tain Fig. [T0l we bin the realisations of the CDFS according to 
the number of group-sized haloes in the realisations. For each 
bin, we obtain the mean shear correlation function and its ra- 
tio to the mean shear correlation function of all realisations, 
r±(6, «haio) = %±(0,nhaio)/t;±(9)- The functions r + and r_ are 
shown in Fig. Q~TJ The realisations with fewer (more) haloes 
than the average generally display a smaller (larger) shear cor- 
relation function. We fit the ratios in each bin with a double 
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Fig. 11. Ratios r + (upper panel) and r_ (lower panel) of the 
shear correlation functions in a particular «haio-bin to the aver- 
age correlation function of all realisations. The lowest (solid) 
curve represents the bin with «haio £ [0, 4), the second lowest 
the bin with «haio £ [4, 8), and so on. The highest ratio corre- 
sponds to the bin with ;ihak> ^ 28. The error bars have been 
estimated from the field-to-field variation. 



power law of the form 

r ± (G, « h aio) = A ± (n w „r w + B ± {n^ Xo )(f M . (25) 

For values of «haio which do not coincide with one of the 
bin centres, the functions r ± are obtained by linear interpo- 
lation between the fits for the two adjacent bins. With this, 
we extend our model for the shear correlation function to 
m' ± (0;7r,nh R i o ) — m ± (6; n) r ± (0, «haio)- In Fig. [T2l we show the 
resulting posterior distributions for cr 8 (Q m = 0.25) and «haio, 
keeping all other cosmological parameters fixed and using a 
uniform prior for «haio- The two-dimensional distribution shows 
a weak correlation between the two parameters: as expected, a 
low (high) value of «haio requires a slightly higher (lower) value 
of cr 8 . The marginalised posterior for cr 8 is very similar to the 
one shown in Fig. [7] where the field selection is not taken into 
account. However, including «h a i increases the MAP estimate 
of cr 8 by 5% to <x 8 = 0.71+° }° for the ICA likelihood and by 
10% to cr 8 = 0.65^jJ 20 f° r tne Gaussian likelihood and the ray- 
tracing covariance matrix. The marginalised posterior distribu- 
tion of «haio shows a weak peak at «h a io » 13 (compared to the 
average of «haio = 18.5 for all ray-tracing realisations) in the 
ICA case and even lower values if the Gaussian likelihood is 
used. Overall, however, the posterior is very shallow. 

Having corrected for the field selection, we can 
now recompute the probabilities given in Tab. [2] for 
drawing the CDFS at random. We find for the ICA- 
MAP estimate Prob(<x 8 < 0.71) = 9.4% for the original 
shear correlation functions and Prob(<x 8 < 0.71) = 18.5% 
for the re-centred ones. For the Gaussian likeli- 




10 15 20 25 
n halo 

Fig. 12. Upper panel: Posterior density for cr 8 (Q m = 0.25) 
and «haio computed using the ICA likelihood, keeping all other 
cosmological parameters constant. Lower panels: Marginalised 
posterior densities of o- 8 (O m = 0.25) (left panel) and «haio (right 
panel). Solid black curves show the results from using the ICA 
likelihood, dashed red lines from the Gaussian likelihood and 
the ray-tracing covariance. 



hood, we find Prob(cr 8 < 0.65) = 6.0% (original) and 
Prob(<x 8 < 0.65) = 14.9% (re-centred), respectively. 

With this (approximate) treatment of the systematic ef- 
fects caused by the field selection, we can now put the CDFS 
in context with the results from the WMAP five-year data. 
For this, we fit the shear correlation function for Q m and 
erg, marginalising over /1100 (with a Gaussian prior centred on 
/iioo = 0.7 and = 0-07, a s suggested by the Hubble Key 



Project; iFreedman et al.ll2001l) and «haio with a uniform prior. 
We use the WMAP Markov chain for a flat ACDM model 



(lcdm+sz+lens; bunklev et ail I2009I: iKomatsu et all l2009h . 
where again we marginalise over all parameters except O m and 
cr 8 . The resulting posterior distributions for the CDFS only 
(blue dashed contours), WMAP only (red contours) and the 
combination of both measurements (thick black contours) are 
shown in Fig. [13] Clearly, the joint posterior is dominated by 
the WMAP data; however, the constraints from the CDFS al- 
low us to exclude parameter combinations where both Q m and 
cr 8 are large. We find the MAP estimates Q m = 0.26+Q-Q2 and 
<t 8 = 0.79+Q Q3 when marginalising over the other parameter. 
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Fig. 13. Posterior density for Q m and erg, where we have 
marginalised over the Hubble constant htoo and the number of 
haloes in the field «h a io- The dashed blue contours show the 
68%, 95% and 99% credible regions resulting from the cosmic 
shear analysis of the CDFS (using the ICA likelihood), the red 
contours show the posterior from the WMAP 5-year data (us- 
ing the flat ACDM model). The combined posterior is shown 
with thick black contours. 



Finally, note that the two criteria discussed in this section 
are not strictly independent. However, it is highly improba- 
ble that a single field will contain more than one massive halo 
above the X-ray flux limit. Therefore, selecting fields without 
an X-ray-bright cluster prior to performing the steps that lead 
to Fig.[lO]would change the halo numbers that go into the anal- 
ysis by at most one and would not significantly influence the 
foregoing discussion. 

5. Summary and discussion 

In this paper, we have investigated the validity of the approx- 
imation of a Gaussian likelihood for the cosmic shear correla- 
tion function, which is routinely made in weak lensing studies. 
We have described a method to estimate the likelihood from 
a large set of ray- tracing simulations. The algorithm tries to 
find a new set of (non-orthogonal) basis vectors with respect 
to which the components of the shear correlation functions be- 
come approximately statistically independent. This then allows 
us to estimate the high-dimensional likelihood as a product of 
one-dimensional probability distributions. A drawback of this 
method is that quite a large sample of realistically simulated 
correlation functions is required to get good results for the tails 
of the likelihood. However, this should become less problem- 
atic in the near future when increasingly large ray-tracing sim- 
ulations will become available. 



We have investigated how the constraints on matter and 
vacuum energy density, Hubble parameter and power spectrum 
normalisation depend on the shape of the likelihood for a sur- 
vey composed of 0.5 deg x0.5 deg fields and a redshift distribu- 
tion similar to the CDFS. We find that if the non-Gaussianity 
of the likelihood is taken into account, the posterior likelihood 
becomes more sharply peaked and skewed. When fitting only 
for Q m and erg, the maximum of the posterior is shifted towards 
lower Q m and higher erg, and the area of the 68% highest poste- 
rior density credible region decreases by about 40% compared 
to the case of a Gaussian likelihood. For the four-dimensional 
parameter space, we have conducted a Fisher matrix analysis 
to obtain lower limits on the errors achievable with a 1500 deg 2 
survey. As in the two-dimensional case, we find the most im- 
portant effect to be that the error bars decrease by 10 - 40% 
compared to the Gaussian likelihood. Less severe is the slight 
tilt of the Fisher ellipses when marginalising over two of the 
four parameters, particularly when hioo is involved. 

In the second part of this work, we have presented a re- 
analysis of the CDFS-HST data. Using the non-Gaussian like- 
lihood, we find cr 8 = 0.6S + _° f 6 for Cl m = 0.25 (keeping all 
other parameters fixed to their fiducial values), compared to 
erg = 0.59*q Jq obtained from the Gaussian likelihood with a 
covariance matrix estimated from the ray-tracing simulations. 
We have then tried to quantify how (un-)likely it is to ran- 
domly select a field with the characteristics of the Chandra 
Deep Field South with a power spectrum normalisation this 
low. We have used 9600 ray-tracing realisations of the CDFS 
to estimate the sampling distribution of the ICA-MAP estima- 
tor for erg. For our fiducial, WMAP5-like cosmology, we find 
that Prob(<xg < 0.68) « 5%, assuming that the location of the 
CDFS on the sky was chosen randomly. The fact that the CDFS 
was selected not to contain an extended X-ray source in the 
ROSAT All-Sky Survey can lead to a bias of the estimated <x 8 
by at most 5%. This is because the clusters excluded by this 
criterion are rare and mostly at low redshifts, and therefore not 
very lensing-efficient. The second relevant selection criterion 
is that the CDFS should not contain any relevant NED source. 
We model this by selecting only those fields which contain a 
specific number «haio of group- and cluster-sized haloes. We 
find that for those realisations for which the number of such 
haloes is below the average, the estimates of erg can be bi- 
ased low by about 5-10%. We include this effect in our like- 
lihood analysis by extending our model shear correlation func- 
tion by a correction factor depending on «haio and treating «haio 
as a nuisance parameter. This increases the estimate of erg by 
5% to <r 8 = 0.71+° ]° for the ICA likelihood and by 10% to 
<Tg = 0.65+q 20 f° r tne Gaussian likelihood. This procedure also 
yields tentative evidence that the number of massive haloes in 
the CDFS is only * 70 % of the average, in qualitative agree- 
ment with the findings of Phl eps et alJ 12007). 

Finally, we combine the CDFS cosmic shear results with 
the constraints on cosmological parameters from the WMAP 
experiment. We fit for fi m and erg, where we marginalise over 
the Hubble constant and take into account the field selection 
bias by marginalising also over «haio- While the posterior is 
clearly dominated by the WMAP data, the CDFS still allows us 
to exclude parts of the parameter space with high values of both 
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O m and <r 8 . Assuming a flat Universe, the MAP estimates for 



these two parameters are £2 m = 0.2frjM5 and &% = 0.79^Jj 
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Appendix A: Projection Pursuit Density Estimation 

In order to have an independent check of the ICA-based like- 
lihood estimation algorithm, we emplo y the method of projec - 
tion pursuit density estimation (PPDE; Friedman et al.ll 19841) . 
Like our ICA method, PPDE aims to estimate the joint proba- 
bility density p(x) of a random vector x, given a set of observa- 
tions of x. As starting point, an initial model po(x) for the mul- 
tidimensional probability distribution p(x) has to be provided, 
for which a reasonable choice is e.g. a multivariate Gaussian 
with a covariance matrix estimated from the data. The method 
then identifies the direction 0\ along which the marginalised 
model distribution differs most from the marginalised density 
of the data points and corrects for the discrepancy along the 
direction 0\ by multiplying po with a correction factor. This 
yields a refined density estimate p\{x), which can be further 
improved by iteratively applying the outlined procedure. 
More formally, the PPDE density estimate is of the form 



Pm(x) = Po(x) Y\ fmiOm ' *) 



(A.l) 



where pm is the estimate after M iterations of the proce- 
dure and po is the initial model. The univariate functions 
f m are multiplicative corrections to the initial model along 
the directions m . The density estimate can be obtained iter- 
atively using the relation Pm(x) = Pm-i(x) fiaiflu • x). At 
the M-th step of the iteration, a direction 0m and a func- 
tion ff 4 are chosen to minimise the Kullback-Leibler diver- 
gence ( Kullback & Leiblerll 1951 ) between the actual data den- 
sity p(x) and the density estimate Pm(x), 



Dkl[p,Pm] = dx p(x) log 



P(x) 
Pm(x) 



(A.2) 



as a goodness-of-fit measure. The Kullback-Leibler divergence 
provides a "distance measure" between two probability distri- 
bution functions, since it is non-negative and zero only if p = q, 
albeit not symmetric. Only the cross term 



W{0 m Jm) 



"I 



dx p(x) log p M (x) 



(A.3) 



of the K-L divergence is relevant for the minimisation, all other 
terms do not depend on m and fy. By using Eq. dA.ll ), one 
sees that the minimum of W is attained at the same location as 
the minimum of 



w(6 M ,f M ) 



-I 



dx p(x) log fu{0 M -x) , 



(A.4) 



which is the expectation value of log /m with respect to p(x). 
The data density p(x) is unknown; however, the data comprise 
a set of N samples from this distribution. The expectation value 
of log /m can therefore be estimated by 



1 N 

MOmJm) = ~— l0g/M(6>M ' Xi) . 

i=i 

For fixed 0m, the minimum of Eq. ( IA.4I ) is attained for 

P e H0 M ■ x) 



(A.5) 



/m(0m • x) - 



Pm_,(&m ■ x) 



(A.6) 



where p &M and p^-\ ^ the marginal densities of the data and 
of model density from the (M - l)-st iteration along the direc- 
tion 0m, respectively. With this, the iterative process that leads 
to estimates of 0m and /m schematically consists of: 

- choosing a direction 0m, 

- computing the marginal densities p e " and Pm-v 

- computing fiaiflu • x) according to Eq. ( IA.61 . 

- computing w{0 M , fni) 

- choosing a new 0m that decreases w 

- continuing from step 2 until a convergence criterion is ful- 
filled. 



To efficiently compute the marginals p" M and p~£_ v 
Carlo samples of these densities are used. Note that the data 
already comprise a sample of p(x); a sample of p£\ can t> e 
obtained efficiently by an iterative method: since pm-\ is simi- 
lar to pm-2, a subset of the sample from pm-\ can be obtained 
by rejection sampling from the sample from the (M - 2)-nd 
step. The remaining data vectors are then drawn by rejection 
sampling from po. For more tech nical details of t he est imation 
procedure, we refer the reader to lFriedman et alJ ( 1984 ). 

Note that the PPDE technique, although using very similar 
methodology as our ICA-based procedure, is different in the 
important point that it does not rely on the assumption that a 
linear transformation of the data leads to statistical indepen- 
dence of the components of the transformed data vectors. It 
therefore comprises a good test of the validity of this approxi- 
mation. 



Monte 



Appendix B: Fisher matrix of the ICA likelihood 

In this appendix, we give the derivation of Eq. ( |23| ). I n the gen- 
eral case, the Fisher matrix is given by fe.g ]KendailetalJl987l) 



'a/3 



dlogLd log L 

dn a dnjj 



(B.l) 
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In our case, the likelihood depends on cosmological parameters 
only through the difference between data and model vector, i.e. 
s = i; - m (see Eq.|8j. This allows us to write 

d log L(s(n)) d log L(s) ds. 



dn a dsi dn a 

dlogp Si (Sj) dsj 



(B.2) 
(B.3) 



where in the last step we have made use of the fact that the 
likelihood factorises in the ICA basis. The expression for the 
Fisher matrix then can be written as 

dlog/> S( 0;) dlog p S] (Sj)\ dihidrhj 

q a — U*- 4 ) 



dsj I dn a dn/j 



Next, we compute the expectation value on the right hand side 
and obtain 

j-j dn a dn p J ds; J dsj 



Zdrhj dm f 
5£55 J ds "« (s,) 



dlog p Si ( sd 
dsi 



(B.6) 



The integrals in the first term of the right hand side vanish since 
the p Sj drop to zero for very large and small values of s,. This 
leaves us with 



dlog p s .( sd 
dsi 



(B.7) 



The derivatives in Eq. ( IB. 7b can be strongly affected by 
noise in the estimated p Si (.?,), in particular in the tails of 
the distributions. For their numerical computation, we there- 
fore choose the following fo ur-point finite difference operator 
dAbramowitz & Stegurll964l) : 



dp = p(s - 2h) - 8p(s -h) + &p(s + h) - p(s + 2h)) | 5 
ds I2h 

(B.8) 

which we find to be more stable against this problem than its 
more commonly used two-point counterpart. Because of this 
potential difficulty, we cross-check our results with the alterna- 
tive method provided by Eq. (f22b . This method is significantly 
slower, but numerically simpler. This is because the derivatives 
of the log-likelihood in Eq. d22l are on average computed close 
to the maximum-likelihood point, where the likelihood esti- 
mate is well sampled. Reassuringly, we find excellent agree- 
ment between the two methods. Finally, we have investigated 
the influence of the choice of the Kernel function K in Eq. ( fTOt . 
which might affect the computation of the numerical deriva- 
tives. Our results prove to be stable against variation of K, pro- 
vided that we chose a differentiable Kernel function. 

Appendix C: Further conclusions for our KSB+ 
pipeline from the STEP simulations 

In this appendix we assume that the reader is familiar with basic 
KSB notation. For a short introduction and a s ummary of dif- 
ferenc es between various implementations see iHeymans et al.l 
(2006). 



Within the Shear TEsting Programing] (STEP) simulated 
images containing sheared galaxies are analysed in blind tests, 
in order to test the shear measurement accuracy of weak lens- 
ing pipelines. In these analyses the shear recovery accuracy 
has been quantified in terms of a multiplicative calibration 
bias m and additive PSF residual s c. From the analysi s of the 
first set of simulations (STEP1, Hevmans et al. 20061) . which 
mimic simplified ground-based observations, we find that our 
KSB+ implementation significantly under-estimates gravita- 
tional shear on average if no calibration correction is applied. 
After the elimination of selection and weighting-related ef- 
fects this shear calibration bias amounts to a relative under- 
estimation of i7i = -9%. According to our testing the largest 
contribution to this bias originates from the inversion of the 
P ? -tensor, which describes the response of galaxy ellipticity 
to gravitational shear. While a full-tensor inversion reduces 
this bias, it strong ly increases the measurement noise (see also 



Erben et al.ll2.001l) and dependence on galaxy selection criteria. 



We therefore decided to stick to the commonly applied approx- 
imation of (P g y l = 2/Tr[P g ], which we measure from indi- 
vidual galaxies, and correct the shear estimate using a multi- 
plicative calibration factor of c ca i = 1.10 in the |S07| analysis. 
This average calibration correction was found to be stable to 
the ~ 2%-level between different STEP1 simulation subsets. 
However, note that the bias depends on the details of the KSB 
implementation, which might explain some of the scatter be- 
tween the results for different KSB codes in STEP1. In par- 
ticular, we identified a strong dependence on the choice of the 
Gaussian filter scale r g , which is used in the computation of 
the KSB brightness moments. For example changing from our 
default r g - 1 .0 r<j , where rf is the flux radius as measured by 
SExtractorl Bertin & Arnoutsll996 ). to r g = 0.7 rf, worsens the 
bias to m = -17%. 

The average calibration correction likewise proved to be 
robust for the sec ond set of image simulations (STEP2, 
iMassey et al.l2007al) . which also mimics ground-based data but 
takes into account more complex PSFs an d galaxy morpholo - 
gies by applying the shapelets formalism dMassev et alj|2004 ). 
Yet, the STEP2 analysis revealed a significant magnitude de- 
pendence of the shear recovery accuracy for our implemen- 
tation, with a strong deterioration at faint magnitudes. In this 
analysis we applied the same signal -to-noise cut S/N > 4.0 as 



in STEP1 (KSB S/N as defined in lErben et al.l 1200 ll) , where 



we however ignored the strong noise correlations present in the 
STEP2 data, which was added to mimic the influence of driz- 
zle. 

In the case of uncorrected noise the dispersion of the sum 
over the pixel values of pixels scales as 



0~N 



(CI) 



where cr; is the dispersion computed from single pixel values. 
Drizzling, or convolution in the case of the STEP2 simulations, 
reduces cr\ but introduces correlations between neighbouring 
pixels. The signal-to-noise of an object is usually defined as the 
ratio of the summed object flux convolved with some window 
or weight function, divided by an rms estimate for the noise in 



http : //www . physics . ubc . ca/~heymans/step . html 
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Fig. C.l. Estimate of the effective influence of the noise cor- 
relations in the STEP2 simulations: Plotted is the ratio of the 
pixel value dispersion cr™ easure measured from large areas of 
N = M 2 pixels to the estimate from the normal single pixel 
dispersion yN cr' ] neasure as a function of M, determined from an 
object-free STEP2 image. In the absence of noise correlations 
r — 1 for all M. The value r - 2.8 for M — > oo gives the factor 
by which the signal-to-noise is over-estimated when measured 
from the single pixel dispersion cr™ easure ignoring the correla- 
tions. 

an equal area convolved with the same weight function. If the 
noise estimate is computed from en and scaled according to 
Eq. IC.ll the correlations are neglected and the noise estimate is 
too small. 

In order to estimate the effective influence of the noise cor- 
relations in STEP2, we use a pure noise image which was pro- 
vided together with the simulated images. We compute the rms 
of the pixel sum cr™ easure in independent quadratic sub-regions 
of the image with side length M = y/N and determine the ratio 

^measure 

r=-^ , (C.2) 

which in the absence of correlated noise would be equal to 1 
for all N. In the presence of noise correlations it will for large 
N converge to the factor by which cr™ easure under-estimates the 
uncorrelated cr\. This can be understood as drizzling or con- 
volution typically re-distributes pixel flux within a relatively 
small area. As soon as this kernel is much smaller than the area 
spanned by M 2 pixels, the correlations become unimportant for 
the area pixel sum. The measured r(M) is plotted in Fig. IC.ll 
Extrapolating to M — > oo we estimate that ordinary noise mea- 
sures based on the single pixel dispersion, which ignore the 
noise correlation, will over-estimate the signal-to-noise of ob- 
jects by a factor r 2.8. Hence, our original selection criterion 
S/N > 4.0 corresponds to a very low true cut S/N true > 1.4 
including much noisier objects than in STEP1. 

We plot the dependence of our STEP2 shear estimate on 
the (uncorrected) S/N in Fig. IC.2l For S/N < 7, corresponding 
to S/N true < 2.5, a significant deterioration of the shear signal 

occurs, with a mean calibration bias (m) 10% and a large 

scatter between the different PSF models. We conclude that this 
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Fig. C.2. Calibration bias m as a function of the uncorrected 
KSB signal-to-noise S/N for the TS analysis of the STEP2 sim- 
ulations. Thin solid (dashed) lines show j\ (72) estimates for 
individual PSFs, where we show individual error-bars only for 
one PSF for clarity. The bold solid line and error-bars show the 
mean and standard deviation of the individual PSF estimates 
and shear components. Note the deterioration of the shear esti- 
mate for the STEP2 galaxies with S/N < 7 (S/N tme < 2.5). For 
this plot an adapted calibration correction of 1 .08 was applied. 

approximately marks the limit down to which our KSB+ imple- 
mentation can reliably measure shear. If we apply a modified 
cut S/N > 7.0 to the STEP2 galaxies, the resulting magnitude 
and size dependence of the shear calibration bias is < ±5% (top 
panels in Fig. IC.3l ). The remaining galaxies are best corrected 
with a slightly reduced calibration factor c ca ] = 1 .08, which we 
apply for the plots shown in this appendix and the updated 
shear analysis presented in this paper. The difference between 
the calibration corrections derived from STEP1 and STEP2 
agrees with the estimated ~ 2% accuracy. Note that the error 
increases for the highly elliptical PSFs D and E (e* 12%) in 
STEP2, for which in addition significant PSF anisotropy resid- 
uals occur (bottom panels in Fi g . |C . 3 1 . This should however 
not affect our analysis given that typical ACS PSF ellipticities 
rarely exceed e* 7%, see e.g.[S07J. 

References 

Abramowitz, M. & Stegun, I. A. 1964, Handbook of 
Mathematical Functions with Formulas, Graphs, and 
Mathematical Tables, ninth edn. (New York: Dover) 
Benjamin, J., Heymans, C, Semboloni, E., et al. 2007, 

MNPvAS, 381,702 
Bernstein, G. M. & Jarvis, M. 2002, AJ, 123, 583 
Berlin, E. & Arnouts, S. 1996, A&AS, 117, 393 
Chiu, K.-C, Liu, Z.-Y., & Xu, L. 2003, in Proc. 4th 
International Symposium on Independent Component 
Analysis and Blind Signal Separation (ICA2003), Nara, 
Japan, 751-756 
Comon, P., Jutten, C, & Herault, J. 1991, Signal Processing, 
24, 1 1 

Cooray, A. & Hu, W. 2001, ApJ, 554, 56 
Davison, A. C. 2003, Statistical Models, Cambridge Series 
in Statistical and Probabilistic Mathematics (Cambridge 



Hartlap et al.: The non-Gaussianity of the cosmic shear likelihood 



17 



STEP2 trP no fit 



SN>7 



o 

E ^ 

d 
i 

CN 

d 
l 




ill 



i_Li 



ill 



ill 



ili 





















+H+ 


+H+ 


+H+ 


MM Mil) 


1^ 


-+++- 


-+++- 


-+++- 


-m 




IT I I 



J_ 



J_ 



J_ 



20 21 22 23 24 250.4 0.6 0.8 1 
r-magnitude size ["] 



J 
1. 2 



Fig. C.3. Calibration bias m and PSF residuals c as a function 
of input galaxy magnitude and size for our refined analysis of 
the STEP2 data. Thin solid (dashed) lines show y\ (72) esti- 
mates for individual PSFs, where we include individual error- 
bars only for one PSF for clarity. Bold solid lines and error-bars 
show the mean and standard deviation of the individual PSF es- 
timates and shear components. In this plot only galaxies with 
S/N > 7 (S/N tlue > 2.5) are taken in to account, which stro ngly 
reduces the deterioration found in iMassev et al. (2007a) for 
faint magnitudes. For this plot an adapted calibration correc- 
tion of 1.08 was applied. 



University Press) 
Dunkley, J., Komatsu, E., Nolta, M. R., et al. 2009, ApJS, 180, 
306 

Eifler, T., Schneider, P., & Hartlap, J. 2008, astro-ph/08 10.4254 
Erben, T., van Waerbeke, L., Bertin, E., Mellier, Y., & 

Schneider, P. 2001, A&A, 366, 717 
Freedman, W. L., Madore, B. F, Gibson, B. K., et al. 2001, 

ApJ, 553, 47 

Friedman, J., Stuetzle, W., & Schroeder, A. 1984, Journal of 

the American Statistical Association, 79, 599 
Fu, L., Semboloni, E., Hoekstra, H., et al. 2008, A&A, 479, 9 
Gelman, A., Carlin, J. B., Stern, H., & Rubin, D. B. 2004, 

Bayesian Data Analysis (Chapman & Hall/CRC) 
Giacconi, R., Rosati, P., Tozzi, P., et al. 2001, ApJ, 551, 624 
Giavalisco, M., Ferguson, H. C, Koekemoer, A. M., et al. 

2004, ApJ, 600, L93 
Grazian, A., Fontana, A., de Santis, C, et al. 2006, A&A, 449, 

951 

Hartlap, J., Simon, P., & Schneider, P. 2007, A&A, 464, 399 
Hastie, T., Tibshirani, R., & Friedman, J. 2001, The Elements 

of Statistical Learning (Springer) 
Heymans, C, Brown, M. L., Barden, M., et al. 2005, MNRAS, 

361, 160 

Heymans, C, Van Waerbeke, L., Bacon, D., et al. 2006, 
MNRAS, 368, 1323 



Hilbert, S., Hartlap, J., White, S. D. M., & Schneider, P. 2009, 
A&A, 499,31 

Hoekstra, H., Franx, M., Kuijken, K., & Squires, G. 1998, ApJ, 
504, 636 

Hoekstra, H., Mellier, Y., van Waerbeke, L., et al. 2006, ApJ, 
647,116 

Hyvarinen, A., Karhunen, J., & Oja, E. 2001, Independent 

Component Analysis (Wiley Interscience) 
Hyvarinen, A. & Oja, E. 1997, Neural Computation, 9(7), 1438 
Hyvarinen, A. & Oja, E. 2000, Neural Networks, 13(4-5), 41 1 
Jain, B., Seljak, U., & White, S. 2000, AJ, 530, 547 
Joachimi, B., Schneider, P., & Eifler, T. 2008, A&A, 477, 43 
Jutten, C. & Herault, J. 1991, Signal Processing, 24, 1 
Kaiser, N. & Pan-STARRS Collaboration. 2005, in Bulletin of 

the American Astronomical Society, Vol. 37, 465 
Kaiser, N., Squires, G, & Broadhurst, T. 1995, ApJ, 449, 460 
Kendall, M. G, Stuart, A., & Ord, J. K, eds. 1987, Kendall's 

advanced theory of statistics (New York, NY, USA: Oxford 

University Press, Inc.) 
Komatsu, E., Dunkley, J., Nolta, M. R., et al. 2009, ApJS, 180, 

330 

Kuijken, K. 2006, A&A, 456, 827 

Kullback, S. & Leibler, R. A. 1951, Annals of Mathematical 

Statistics, 22, 79 
Liang, J.-J., Fang, K.-T., Hickernell, F. J., & Li, R. 2001, Math. 

Comput., 70, 337 
Luppino, G. A. & Kaiser, N. 1997, ApJ, 475, 20 
Massey, R., Heymans, C, Berge, J., et al. 2007a, MNRAS, 376, 

13 

Massey, R., Refregier, A., Conselice, C. J., David, J., & Bacon, 

J. 2004, MNRAS, 348,214 
Massey, R., Rhodes, J., Leauthaud, A., et al. 2007b, ApJS, 172, 

239 

Miller, L., Kitching, T. D., Heymans, C, Heavens, A. F, & van 

Waerbeke, L. 2007, MNRAS, 382, 315 
Peacock, J. A. & Dodds, S. J. 1996, MNRAS, 280, 19 
Phleps, S., Wolf, C, Peacock, J. A., Meisenheimer, K, & 

van Kampen, E. 2007, in Astronomical Society of the 

Pacific Conference Series, Vol. 379, Cosmic Frontiers, ed. 

N. Metcalfe & T. Shanks, 327 
Press, W. et al. 1992, Numerical Recipes in C (Cambridge 

University Press) 
R Development Core Team. 2007, R: A Language and 

Environment for Statistical Computing, R Foundation for 

Statistical Computing, Vienna, Austria 
Refregier, A. & Bacon, D. 2003, MNRAS, 338, 48 
Reiprich, T. H. & Bohringer, H. 2002, ApJ, 567, 716 
Rix, H.-W., Barden, M., Beckwith, S. V. W., et al. 2004, ApJS, 

152, 163 

Schneider, P. 2006, in Saas-Fee Advanced Course 33: 

Gravitational Lensing: Strong, Weak and Micro, ed. 

G. Meylan, P. Jetzer, P. North, P. Schneider, C. S. Kochanek, 

& J. Wambsganss, 269-451 
Schneider, P., van Waerbeke, L., Kilbinger, M., & Mellier, Y. 

2002, A&A, 396, 1 
Schrabback, T., Erben, T., Simon, P., et al. 2007, A&A, 468, 

823 

Scoccimarro, R., Zaldarriaga, M., & Hui, L. 1999, ApJ, 527, 1 



18 



Hartlap et al.: The non-Gaussianity of the cosmic shear likelihood 



Scott, D. W. 1992, Multivariate Density Estimation: Theory, 
Practice, and Visualization (New York: John Wiley & Sons) 

Semboloni, E., Mellier, Y., van Waerbeke, L., et al. 2006, 
A&A, 452,51 

Semboloni, E., Tereno, I., van Waerbeke, L., & Heymans, C. 

2008, astro-ph/0812.1881 
Semboloni, E., van Waerbeke, L., Heymans, C, et al. 2007, 

MNRAS, 375, 6 
Silverman, B. W. 1986, Density Estimation (London: Chapman 

and Hall) 

Smail, I., Hogg, D. W., Yan, L., & Cohen, J. G. 1995, ApJ, 449, 
105 

Smith, R. E., Peacock, J. A., Jenkins, A., et al. 2003, MNRAS, 

341, 1311 
Springel, V. 2005, MNRAS, 364, 1 105 

Springel, V., White, S. D. M., Jenkins, A., et al. 2005, Nature, 
435, 629 

Takada, M. & Jain, B. 2009, MNRAS, 395, 2065 

Venables, W. & Ripley, B. 2002, Modern Applied Statistics 

with S (Springer) 
Wolf, C, Meisenheimer, K., Kleinheinrich, M., et al. 2004, 

A&A, 421,913 



